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ABSTRACT 


Some  recent  work  in  the  area  of  learning  structural  descriptions  from  examples 
is  reviwed  in  light  of  the  need  in  many  diverse  disciplines  for  programs  which 
can  perform  conceptual  data  analysis.   Such  programs  describe  complex  data  in 
terms  of  logical,  functional,  and  causal  relationships  which  cannot  be  dis- 
covered using  traditional  data  analysis  techniques.   Various  important  aspects 
of  the  problem  of  learning  structural  descriptions  are  examined  and  criteria 
for  evaluating  current  work  is  presented.   Methods  published  by  Buchanan, 
et.  al.  [1-3,20],  Hayes-Roth  [6-9],  and  Vere  [22-25],  are  analyzed  according 
to  these  criteria  and  compared  to  a  method  developed  by  the  authors.   Finally 
some  goals  are  suggested  for  future  research. 
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1.  INTRODUCTION 

1 . 1  Motivation  and  Basic  Concepts 

There  are  many  problem  areas  where  large  volumes  of  data  are  generated 
about  a  class  of  objects,  the  behavior  of  a  system,  a  process,  etc.  Scien- 
tists in  fields  as  diverse  as  agriculture,  chemistry,  and  psychology  are 
faced  with  the  need  to  analyze  such  data  in  order  to  detect  regularities 
and  common  patterns.  Traditional  tools  for  data  analysis  include  various 
statistical  techniques,  curve-fitting  techniques,  numerical  taxonomy,  etc 
These  methods,  however,  are  often  not  satisfactory  because  they  impose  an 
overly  restrictive  mathematical  framework  on  the  scope  of  possible  solu- 
tions. For  example,  statistical  methods  describe  the  data  in  terms  of  pro- 
bability distribution  functions  placed  on  random  variables.  As  a  result, 
the  types  of  patterns  which  they  can  discover  are  limited  to  those  which 
can  be  expressed  by  placing  constraints  upon  the  parameters  of  various  pro- 
bability distribution  functions.  Because  of  the  mathematical  frameworks 
upon  which  they  are  based,  traditional  methods  cannot  detect  conceptual 
patterns  such  as  the  logical,  causal,  or  functional  relationships  that  are 
typical  of  descriptions  produced  by  humans.  This  is  a  well-known  problem 
in  AI,  namely  that  a  system  in  order  to  learn  something  must  first  be  able 
to  express  it.  The  solution  requires  introducing  more  powerful  representa- 
tions for  hypotheses  and  developing  corresponding  techniques  of  data 
analysis  and  pattern  discovery.  Work  done  In  AI  and  related  areas  on  com- 
puter induction  and  learning  structural  descriptions  from  examples  has  laid 
the  groundwork  for  researh  in  this  area.  This  is  not  accidental,  because, 
as  Michie  [17]  has  pointed  out,  the  development  of  systems  which  deal  with 
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problems  in  human  conceptual  terms  is  a  fundamental  characteristic  of  AI 
research. 

In  this  paper,  we  examine  some  of  the  recent  work  in  AI  on  the  subject  of 
learning  and  generalization  of  structural  descriptions.  In  particular,  we 
will  review  four  recent  methods  of  inductive  generalization:  Buchanan  et. 
al.,  Hayes-Poth,  Vere,  and  our  own  work  (Farlier  well-known  work  by  Winston 
was  recently  reviewed  by  Knapman  [10]).  We  also  outline  some  goals  for 
research  in  this  area.  Attention  is  given  primarily  to  the  simplest  form  of 
generalization,  namely  the  maximally  specific  conjunctive  statements  which 
characterize  a  single  set  of  input  events  (called  for  short,  conjunctive 
generalizations).  The  reason  for  this  choice  is  that  most  work  done  in 
this  area  is  addressing  this,  quite  restricted,  subject.  Many  of  the 
researchers  whose  work  we  review  in  this  paper  have  done  work  on  other  as- 
pects of  machine  learning  including  generalization  using  negative  examples 
(Vere,  Michalski)  and  developing  discriminant  descriptions  of  several 
classes  of  objects  (Michalski).  Due  to  space  limitations,  we  have  been  un- 
able to  include  these  topics  in  this  paper.  Instead,  these  contributions 
are  mentioned  in  the  sections  concerning  extensions.  We  begin  the  analysis 
by  first  discussing  several  important  aspects  of  the  problem  of  learning 
conceptual  descriptions: 

.  types  of  descriptions:   characteristic  versus  discriminant 

.  forms  of  descriptions 

.  types  of  generalization  processes  involved  in  generalizing  descrip- 
tions (rules  of  generalization) 

•  constructive  versus  non-constructive  induction 

.  peneral  versus  problem-oriented  methods  of  induction. 
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1.2  Types  of  Descriptions 

We  distinguish  between  characteristic  and  discriminant  descriptions  [15]. 
A  characteristic  description  Is  a  description  of  a  single  set  of  objects 
(examples,  events)  which  is  intended  to  discriminate  that  set  of  objects 
from  all  other  possible  objects.  For  example,  a  characteristic  description 
of  the  set  of  all  tables  would  discriminate  any  table  from  all  things  which 
are  non-tables.  Psychologists  consider  this  problem  under  the  name  of  con- 
cept formation  (e.g.  Hunt  [9]).  Since  it  is  impossible  to  examine  all  oth- 
er possible  objects,  a  characteristic  description  is  usually  developed  by 
specifying  all  characteristics  which  are  true  for  all  known  objects  of  the 
class  (positive  examples).  Alternatively,  in  some  problems  there  are 
available  so-called  "near  misses"  which  can  be  used  to  more  precisely  cir- 
cumscribe the  given  class. 

A  discriminant  description  is  a  description  of  a  single  class  of  objects  in 
the  context  of  a  fixed  set  of  other  classes  of  objects.  It  states  only 
those  properties  of  objects  in  the  class  under  consideration  which  are 
necessary  to  distinguish  them  from  the  objects  in  the  other  classes.  A 
characteristic  description  can  be  viewed  as  a  discriminant  description  in 
which  the  given  class  is  discriminated  against  infinitely  many  alternative 
classes . 

In  this  paper  we  restrict  ourselves  to  the  problem  of  determining  charac- 
teristic descriptions.  The  problem  of  determining  discriminant  descrip- 
tions has  been  studied  by  Michalski  and  his  collaborators  [12-16]). 

1 .3  Forms  of  Descriptions 
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Descriptions,  either  characteristic  or  discriminant,  may  take  several 
forms.  In  this  paper  we  concentrate  on  generalizations  in  conjunctive 
form.  Other  forms  include  disjunctions,  exceptions,  production  rules  of 
various  types,  hierarchical  and  multilevel  descriptions,  semantic  nets,  and 
frames . 

1 . 4  Generalization  Rules 

The  proc-ss  of  inducing  a  general  description  from  examples  can  be  viewed 
as  a  process  of  applying  certain  generalization  rules  to  the  initial 
descriptions  to  transform  them  into  more  general  output  descriptions.  This 
viewpoint  permits  one  to  characterize  various  methods  of  induction  by 
specifying  the  rules  of  generalization  which  they  use.  Below  is  a  brief  re- 
view of  various  generalization  rules  based  on  the  paper  [16]. 

i)  Dropping  Condition  Rule .  If  a  description  is  viewed  as  a  conjunc- 
tion of  conditions  which  must  be  satisfied,  then  one  way  to  generalize  it 
is  to  drop  one  or  more  of  these  conditions.   For  example: 

red(x)  *  big(x)   k   red(x) 

(this  reads:  "the  description  'xs  which  are  red  and  big'  can  be  generalized 
to  the  description  'xs  which  are  red';  |<  denotes  the  generalization 
operator) 

ii)  Turning  Constants  to  Variables  Rule.  If  we  have  two  or  more 
descriptions,  each  of  which  refers  to  a  specific  object  (in  a  set  to  be 
characterized),  we  can  generalize  these  by  creating  one  description  which 
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contains  a  variable  in  place  of  the  specific  object: 

tall(Fred)   man(Fred)  | 

|<   tall(x)  man(x) 
tall(Jim)   man(Jim)    | 

assuming  that  the  value  set  of  x  is  {Fred,  Jim,  ...  }.  'x'  can  be  inter- 
preted as  representing  'a  person  from  the  .group  under  consideration.' 

These  first  two  rules  of  generalization  are  the  rules  most  commonly  used  in 
the  literature  on  computer  induction.  Both  rules  can,  however,  be  viewed 
as  special  cases  of  the  following  rule. 

iii)  Generalizing  by  Internal  Disjunction  Pule.  A  description  can  be 
generalized  by  extending  the  set  of  values  that  a  descriptor  (i.e.  vari- 
able, function,  or  predicate)  is  permitted  to  take  on  in  order  that  the 
description  is  satisfied.  This  process  involves  an  operation  called  the 
internal  disjunction.   For  example; 

shape(x, square)    | 

|<   shape(x, (square  or  triangle  or  rectangle)) 
shape(x, triangle)  | 

where  statements  on  the  left  of  |<  describe  some  single  objects  in  a  class, 

and  the  statement  on  the  right  is  a  plausible  generalization. 


Using  the  notation  of  variable-valued  logic  system  VL-.  [16]  this  rule   can 
be  expressed  somewhat  more  compactly: 


[shape(x)=square]    | 

|<       [shape(x)=square,  triangle,  rectangle] 
[shape(x)=triangle]  | 


The  ','  in  the  expression  on  the  right  of   the   |<  denotes   the   internal 
disjunction.    Although  it  may  seem  at  first  glance  that  the  internal  dis- 
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junction  is  just  a  notational  abbreviation,  this  operation  appears  to  be 
one  of  the  fundamental  operations  people  use  in  generalizing  descriptions. 

In  general  this  rule  can  be  expressed: 

V[L  =  PI]      k      W[L  -  R2] 

where  W  is  some  condition  and  where  Rl  and  R2  are  sets  of  values  linked  by 
Internal  disjunction,  and  PI   R2. 

There  are  two  other  important  special  cases  of  this  rule.  First,  when  the 
descriptor  involved  takes  on  values  which  are  linearly  ordered  (a  linear 
descriptor)  and  the  second  when  the  descriptor  takes  on  values  which 
represent   concepts   at   various   levels  of  generality  (a  structured 

descriptor) . 

In  the  case  of  a  linear  descriptor  we  have: 

iv)  Closing  Interval  Pule.  For  example,  suppose  two  objects  of  the 
same  class  have  all  the  same  characteristics  except  that  they  have  dif- 
ferent sizes,  a  and  b.  Then,  it  is  plausible  to  hypothesize  that  all 
objects  which  share  these  characteristics  but  which  have  sizes  between  a 
and  b   are  also  in  this  class. 

W[size(xl)-a]   | 

fc    W[size(x)  -  a..b] 
W[size(x2)«b]   | 

In  the  case  of  structured  descriptors  we  have: 

v)  Clinbing  Oneralizatlon  Tree  Rule.   Suppose  the  value  set  of  the 
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shape  descriptor  is  the  tree  of  concepts: 


plane  geometric  figure 


polygon  oval  figure 

;le   rectangle     ellipse   circl< 


triangle   rectangle     ellips< 

With  this  tree  structure,  values  such  as  triangle  and  rectangle  can  be  gen- 
eralized by  climbing  the  generalization  tree: 


[shape(x)=rectangle]  | 

|<  [shape(x)-polygon] 
[shape(x)=triangle]   | 


1 .5  Constructive  Induction 

>fost  methods  of  induction  produce  descriptions  which  involve  the  same 
descriptors  which  were  present  in  the  initial  data.  These  methods  operate 
by  selecting  descriptors  from  the  input  data  and  putting  them  into  a  form 
which  is  an  appropriate  generalization.  Such  methods  perform 
non -constructive  induction.  A  method  performs  constructive  induction  if  it 
includes  mechanisms  which  can  generate  new  descriptors  not  present  in  the 
input  data.  These  new  descriptors  are  generated  by  applying  rules  of  con- 
structive induction.  Such  rules  may  be  written  as  procedures  or  as  produc- 
tion rules  and  may  be  based  on  general  knowledge  or  on  problem-oriented 
knowledge  (for  examples  of  constructive  generalization  rules  see  [16]). 
Constructive  induction  rules  can  interpret  the  input  data  in  terms  of 
knowledge  about  the  problem  domain.  Frequently,  the  solution  to  a  problem 
is  dependent  upon  finding  the  proper  description  for  the  problem;  as  in  the 
mutilated  checkerboard  problem.  An  inductive  program  should  contain  facil- 
ities for  constructive  induction  including  a  library  of  general  construc- 
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tive  induction  rules.  The  user  should  be  able  to  suggest  new  rules  for  the 
program  to  examine.  In  order  to  activate  those  rules  which  would  be  most 
useful,  the  program  must  be  able  to  efficiently  search  the  space  of  possi- 
ble constructive  induction  rules. 

Programs  which  perform  constructive  induction  are  more  likely  to  find  use- 
ful and  interesting  patterns  in  complex  data  since  they  have  the  ability  to 
examine  the  data  using  many  different  representations. 

1 . 6  General  versus  Problem-oriented  Methods 

It  is  a  common  view  that  general  methods  of  induction,  although  mathemati- 
cally elegant  and  theoretically  applicable  to  many  problems,  are  in  prac- 
tice very  inefficient  and  rarely  lead  to  any  interesting  solutions.  This 
opinion  seems  to  have  lead  certain  workers  to  abandon  (at  least  temporari- 
ly) work  on  general  methods  and  concentrate  on  some  specific  problem  (e.g., 
Buchanan,  et.  al.  [1,2,3]  or  Lenat  [11]).  This  approach  often  leads  to  in- 
teresting and  practical  solutions.  On  the  other  hand,  it  is  often  diffi- 
cult to  extract  general  principles  of  induction  from  such  problem-specific 
work.  It  is  also  difficult  to  apply  such  special-purpose  programs  to  new 
areas . 

An  attractive  possibility  for  solving  this  dilemma  is  to  develop  methods 
which  incorporate  various  general  principles  of  induction  (including  con- 
structive induction)  together  with  mechanisms  for  using  exchangeable  pack- 
ages of  problem-specific  knowledge.  In  this  way  a  general  method  of  induc- 
tion, provided  with  an  appropriate  package  of  knowledge,  could  be  both 
easily  applicable  to  different  problems  and  also  efficient  and  practically 
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useful.   This   idea  underlies   the  development   of   the   INDUCE  programs 
[13,15,16]. 

2.  COMPARATIVE  REVIEW  OF  SELECTED  METHODS 

2.1  Evaluation  Criteria 

Ue  evaluate  the  selected  methods  of  induction  in  terms  of  several  criteria 
considered  especially  important  in  view  of  the  remarks  in  section  1. 

i)  Adequacy  of  the  representation  language.  The  language  used  to 
represent  input  data  and  output  generalizations  determines  to  a  large  ex- 
tent the  quality  and  usefulness  of  the  output  descriptions.  Although  it  is 
difficult  to  assess  the  adequacy  of  a  representation  language  out  of  the 
context  of  some  specific  problem,  recent  work  in  AI  has  shown  that 
languages  which  treat  all  phenomena  uniformly  must  sacrifice  descriptive 
precision.  For  example,  researchers  who  are  attempting  to  build  natural- 
language  systems  prefer  the  richer  knowledge  representations  such  as  frames 
and  semantic  nets  (with  their  tremendous  variety  of  syntactic  forms)  to 
more  uniform  and  less  structured  representations  such  as  attribute-value 
lists  and  PLANNER-style  databases.  In  our  own  work  on  inductive  learning, 
we  have  chosen  to  use  the  representation  language  VL-.  (see  below)  which 
has  a  wider  variety  of  syntactic  forms  than  our  earlier  language  VI  . 
Although  languages  with  many  syntactic  forms  do  provide  greater  descriptive 
precision,  they  also  make  the  induction  process  more  complex.  In  order  to 
control  this  complexity,  a  compromise  must  be  sought  between  uniformity  and 
richness  of  forms.  In  the  evaluation  of  each  method,  a  review  of  the 
operators  and  syntactic  forms  of  each  description  language  is  provided. 
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ii)  Pules  of  generalization  Implemented*  The  generalization  rules  im- 
plemented in  each  algorithm  are  listed. 

iii)  Computational  efficiency.  The  exact  analysis  of  the  computational 
efficiency  of  these  algorithms  is  very  difficult  due  both  to  the  inherent 
complexity  of  the  algorithms  and  to  the  lack  of  precise  formulations  of  the 
algorithms  in  available  publications.  However,  it  seems  useful  to  have 
some  data  comparing  the  efficiency  of  these  algorithms  even  if  that  data  is 
approximate  and  based  on  hand-simulations.  To  get  some  indication  of  the 
efficiency  we  measure  the  total  number  of  description  generations  or  com- 
parisons required  by  each  method  to  perform  a  test  example  (see  Fig.  1). 
Ue  also  measure  the  ratio  of  the  number  of  output  conjunctive  generaliza- 
tions to  the  total  number  of  generalizations  examined  on  this  example. 
Since  these  numbers  are  derived  from  only  one  example,  it  is  not  appropri- 
ate to  draw  strong  conclusions  from  them  concerning  the  general  performance 
of  the  algorithms.  Our  conclusions  are  based  primarily  on  the  general 
behavior  of  the  algorithms. 

iv)  Flexibility  and  extensibility.  Mere  conjunctive  characteristic  gen- 
eralizations are  not  particularly  useful  for  conceptual  data  analysis  be- 
cause of  their  limited  format  and  their  lack  of  formal  mechanisms  for  han- 
dling errors  in  the  input  data.  It  is  important  in  evaluating  these  algo- 
rithms to  consider  the  ease  with  which  each  method  could  be  extended  to 

a)  discover  descriptions  with  forms  other   than  conjunctive   generaliza- 
tions (see  section  1.3), 

b)  include  mechanisms  which  facilitate  the  detection  of  errors  in  the  in- 
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put  data, 

c)  provide  a  general  facility  for  incorporating  domain-specific  knowledge 
into  the  induction  process  as  an  exchangeable  package  (Ideally,  the 
domain-specific  knowledge  should  be  Isolated  from  the  general-purpose  in- 
ductive process.) »  and 

d)  perform  constructive  induction. 

It  is  difficult  to  assess  the  flexibility  and  extensibility  of  the  algo- 
rithms presented  here.  We  base  our  evaluation  on  the  general  approaches  of 
the  methods  and  on  extensions  which  have  already  been  made  to  them. 

In  the  following  sections,  we  describe  each  method  by  presenting  the 
description  language  used,  sketching  the  underlying  algorithm,  and  evaluat- 
ing the  method  in  terms  of  the  above  criteria.  Fach  method  will  be  illus- 
trated using  the  test  example  shown  in  Fig.  1. 


r,     □ 


A 


Figure  1 

2.2  Data-driven  Methods:  Hayes -Pot h  and  Vere. 

Methods  can  be  divided  into  bottom-up  (data-driven) ,  top-down  (model- 
driven)  ,  and  mixed  methods.  Bottom-up  methods  generalize  the  input  events 
pairwise  until  the  final  conjunctive  generalization  is  computed: 
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F1=C1   E2   E3   E4 
C.2   is  the  set  of  conjunctive  generalizations  of  El  and  E2.   Gi  is   the  set 
of   conjunctive  generalizations  obtained  by  taking  each  element  of  Gl-1  and 
generalizing  it  with  Ei . 


TTe  consider  here  only  the  methods  described  by  Hayes-Roth  and  Vere.  Other 
bottom-up  methods  include  the  candidate  elimination  approach  described  by 
Mitchell  [18]  and  the  Uniclass  method  described  by  Stepp  [20]. 

2.2.1   Hayes-Roth-  Program  SPROUTER  [5-?] 


Hayes-Poth  uses  the  term  maximal  abstraction  or  interference  match  for  max- 
imally specific  conjunctive  generalization.  He  uses  parameterized  struc- 
tural representations  (PSRs)  to  represent  both  the  input  events  and  their 
generalizations.   For  example,  consider  the  two  events  described  in  Fig.  2: 


o 
□ 

El 


D. 


o 


£2 


The  PSRs  for  these  could  be 


Figure   2 


El:    {{circle :a}{square:b }{ small :a> 

{smal 1  :b}{ontop :a,    underrb}} 
E2:    {{c1rcle:c}{square:d>{circle:e> 
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{small :c >{ large :d>{ small :e} 
{ontop:c,  under  :d} 
{lnsldere,  outsiderd}) 

The  expressions  such  as  {small :a}  are  case  frames  made  up  of  case  labels 
(small,  circle,  etc.)  and  parameters  (a,  b,  c,  d).  The  PSR  can  be  inter- 
preted as  a  conjunction  of  predicates  of  the  form  small (a)  where  the  param- 
eters are  existentially  quantified  variables  which  are  assumed  to  be  dis- 
tinct. 

The  interference  match  attempts  to  find  the  longest  one-to-one  match  of 
parameters  and  case  frames  (i.e.,  the  longest  common  subexpression).  This 
is  accomplished  in  two  steps.  First  the  case  relations  in  El  and  E2  are 
matched  in  all  possible  ways  to  obtain  the  set  M.  Two  case  relations  match 
if  all  of  their  case  labels  match.  Each  element  of  M  is  a  case  relation 
and  a  list  of  parameter  correspondences  which  permit  that  case  relation  to 
match  in  both  events: 

M  =  {{circle:((a/c)(a/e))}{square:((b/d))> 
{small : ( (a/c) (b/c) (a/e) (b/e) ) } 
{ontop, under: ((a/c  b/d))>> 

The  second  step  involves  selecting  a  subset  of  the  parameter  correspon- 
dences in  M  such  that  all  parameters  can  be  bound  consistently.  This  is 
conducted  by  a  breadth-first  search  of  the  space  of  possible  bindings  with 
pruning  of  unpromising  nodes.    The  search  can  be  visualized  as  a  node- 


-  13  - 


building  process.   Here  is  one  such  (pruned)  search: 


M 


Interference  match 


{ontop, under) 
a/c  b/d 


The  nodes  are  numbered  in  order  of  generation.  One  at  a  time,  a  node  is 
examined  and  joined  with  all  other  consistent  nodes  which  have  already  been 
examined.  The  nodes  5,  8,  and  9  are  conjunctive  generalizations.  Node  9 
binds  a  to  c  (to  give  1)  and  b  to  d  (to  give  2)  to  produce  the  conjunction: 

{{circle : 1}{ square: 2 }{ small: 1} 
{ontop:  1,  under:2}} 

The  node-building  process  is  guided  by  computing  a  utility  value  for  each 
candidate  node  to  be  built.  The  nodes  are  pruned  by  setting  an  upper  limit 
on  the  total  number  of  possible  nodes  and  pruning  nodes  of  low  utility  when 
that  limit  is  reached. 

Evaluation : 

i)   Representational  adequacy.   The  algorithm  discovers   the   following 
conjunctive  generalizations  of  the  example  in  Fig.  1 : 

1.  {{ontop:!,  under : 2>{medium: 1 >{clear: 1>> 
There  ts  a  medium  clear  object  ontop  of 
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something. 

2.  {{ontop:l,  under:  2}{medium:  lKlarge:  2} 

{clear:2}> 
There  is  a  medium  object  ontop  of  a 
large,  clear  object. 

3.  {{medium: 1>{ clear: 1}{ large: 3 }{ clear: 3} 

{shaded :2>> 
There  is  a  medium  sized  clear  object, 
a  large  sized  clear  object,  and  a 
shaded  object. 

PSRs  provide  two  symbolic  forms:  parameters  and  case  labels.  The  case  la- 
bels can  express  ordinary  predicates  and  relations  easily.  Symmetric  rela- 
tions may  be  expresed  by  using  the  same  label  twice  as  in  {same!size:a , 
same! size :b}.  The  only  operator  is  the  conjunction.  The  language  has  no 
disjunction  or  internal  disjunction.  As  a  result,  the  fact  that  the  top 
element  in  Fig.  1  is  always  either  a  square  or  a  diamond  cannot  be 
discovered. 

ii)  Rules  of  generalization.  The  method  uses  the  dropping  condition 
and  turning  constants  to  variables  rules. 

iii)  Computational  efficiency.  On  our  test  example,  the  algorithm  re- 
quires 22  comparisons  and  generates  2D  candidate  conjunctive  generaliza- 
tions of  which  6  are  retained.  This  gives  a  figure  of  6/20  or  30%  for  com- 
putational efficiency.  Four  separate  interference  matches  are  required 
since  the  first  match  of  Fl  and  E2  produces  three  possible  conjunctive  gen- 
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eralizations. 

lv)  Flexibility  and  extensibility.  Payes-Poth  has  indicated  (personal 
communication)  that  this  method  has  been  extended  to  produce  disjunctive 
generalizations  and  to  detect  errors  in  data.  Hayes-Roth  has  applied  this 
method  to  various  problems  in  the  design  of  the  speech  understanding  system 
Hearsay  II.  However,  no  facility  has  been  developed  for  incorporating 
domain-specific  knowledge  into  the  generalization  process. 

Also,  no  facility  for  constructive  induction  has  been  incorporated  although 
Hayes-Roth  has  developed  a  technique  for  converting  a  PSR  to  a  lower-level 
finer-grained  uniform  PSR.  This  transformation  permits  the  program  to 
develop  descriptions  which  involve  a  many-to-one  binding  of  parameters. 

2.2.2   Vere:  Program  Thoth  [21-24] 

Vere  uses  the  term  maximal  conjunctive  generalization  or  maximal  unifying 
generalization  to  denote  the  maximally  specific  conjunctive  generalization. 
Each  event  is  represented  as  a  conjunction  of  literals.  A  literal  is  a 
parenthesized  list  of  constants  called  terms.  For  example,  the  objects  in 
Fig.  1  would  be  described: 

P.l:  (circle  a)(square  b)(small  a)(small  b) 

(ontop  a  b) 
F2:  (circle  c) (square  d) (circle  e) 

(small  c) (large  d) (small  e) 

(ontop  c  d)(inside  e  d) 

Although  these  resemble  Hayes-Roth's  PSRs ,  they  are  quite  different.  There 
are  no  distinguished  symbols.   All  terms  are  treated  uniformly. 

The  algorithm  operates  In  four  steps.   First,  the  literals  in  each   of   the 
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two  events  to  be  generalized  are  matched  in  all  possible  ways  to  generate 
the  set  of  matching  pairs  MP.  Two  literals  match  if  they  contain  the  same 
number  of  constants  and  they  share  a  common  term  in  the  same  position.  For 
the  example  of  Fig.  2, 

MP»  {  ((circle  a), (circle  c)), 
((circle  a), (circle  e)), 
((square  b), (square  d)), 
((small  a), (small  c)), 
((small  a), (small  e) ) , 
((small  b), (small  c)), 
((small  b), (small  e)), 
((ontop  a  b),(ontop  c  d))  } 

The  second  step  involves  selecting  all  possible  subsets  of  MP  such  that  no 

single  literal  of  one  event  is  paired  with  more  than  one  literal  in  another 

event.   Each  of  these  subsets  eventually  forms  a  new  generalization  of   the 

original  events. 

In  the  third  step,  each  subset  of  matching  pairs  selected  in  step  2  is  ex- 
tended by  adding  to  the  subset  additional  pairs  of  literals  which  did  not 
previously  match.  A  new  pair  p  is  added  to  a  subset  S  of  MP  if  each 
literal  in  p  is  related  to  some  other  pair  q  in  S  by  a  common  constant  in  a 
common  position.  For  example,  if  S  contained  the  pair  ((square  b), (square 
d))  then  we  could  add  to  S  the  pair  ((ontop  a  b),  (inside  e  d))  because  the 
third  element  of  (ontop  a  b)  is  the  second  element  of  (square  b)  and  the 
third  element  of  (inside  e  d)  is  the  second  element  of  (square  d)  (Vere 
calls  this  a  3-2  relationship) .  We  continue  adding  new  pairs  until  no  more 
can  be  added  . 

In  step  4  the  resulting  set  of  pairs  is  converted  into  a  new  conjunction  of 
literals  by  merging  each  pair  to  form  a  single  literal.   Constants  which  do 
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not  match  are  turned  into  new  constants  which  may  be  viewed  as  variables* 
For  example,  ((circle  a),  (circle  c))  would  be  converted  to  (circle  1). 

Evaluation: 

i)  Representational  adequacy.  When  applied  to  the  test  example  (Fig. 
1)  this  algorithm  produces  many  generalizations.  A  few  of  the  significant 
ones  are  listed  here: 

1.  (ontop  1  2) (medium  1) (large  2) (clear  2)  (clear  3) (shaded  4) (5  4) 

i 
There  is  a  medium  object  on  top  of  a  large  clear  object.   Another  ob- 
ject  is  clear.   There  is  a  shaded  object.  (Mote  also  the  vacuous  re- 
lationship 5  derived  from  unifying  circle  and  triangle) . 

2.  (ontop  1  2) (clear  1) (medium  1)(9  1)  (5  3  4) (shaded  3) (7  3) (6  3) (clear 
4)  (large  4)  (8  4) 

There  is  a  medium,  clear  object  on  top  of  some  other  object  and  there 
are  two  objects  related  in  some  way  (5)  such  that  one  is  shaded  and 
the  other  is  large  and  clear.  (Note  the  vacuous  relationships  6,  7, 
8,  and  9)  . 

3.  (ontop  1  2) (medium  1) (clear  2) (large  2) (5  2)   (shaded   3) (7   3) (clear 
4)(6  4) 

There  is  a  medium  object  on  top  of  a  large  clear  object.  There  is  a 
shaded  object  and  there  is  a  clear  object.  (Note  the  vacuous  rela- 
tionships 5,  6,  and  7). 

The  representation  is  very  general.  By  convention  the  first  symbol  of  a 
literal  can  bp  interpreted  as  a  predicate  symbol.  The  algorithm,  however, 
treats  all  constants  uniformly.   This  creates  difficulties.   For   instance 
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the  algorithm  generates  vacuous  literals  in  certain  situations.  Literals 
can  be  formed  by  pairing  (red  x)  with  (big  y)  to  produce  meaningless  gen- 
eralizations. One  advantage  of  this  relaxation  of  semantic  constraints  is 
that  the  program  can  discover  conjunctive  generalizations  involving  a 
many-to-one  binding  of  variables. 

The  language  contains  only  a  conjunction  operator.  No  disjunction  or 
internal  disjunction  is  included. 

ii)  Rules  of  generalization.  The  algorithm  implements  the  dropping 
condition  rule  and  the  turning  constants  to  variables  rule. 

iii)  Computational  efficiency.  From  the  published  articles  [21-24]  it 
is  not  clear  how  to  perform  step  2.  The  space  of  possibilities  is  very 
large  and  an  exhaustive  search  could  not  possibly  give  the  computation 
times  which  Vere  has  published.  It  would  be  interesting  to  find  out  what 
heuristics  are  being  used  to  guide  the  search. 

iv)  Flexibility  and  extensibility.  Vere  has  published  algorit^T- 
which  discover  descriptions  with  disjunctions  [23]  and  exceptions  [24].  He 
has  also  developed  techniques  to  generalize  relational  production  rule? 
[22,23].  The  method  has  been  demonstrated  using  the  traditional  AI  toy 
problems  of  IQ  analogy  tests  and  blocks-world  sequences.  A  facility  for  us- 
ing background  information  to  assist  the  induction  process  has  also  been 
developed.  It  uses  a  spreading  activation  technique  to  extract  relevant 
relations  from  a  knowledge  base  and  add  them  to  the  input  examples  prior  to 
generalizing  them.  Since  the  method  has  been  extended  to  discover  disjunc- 
tions aud  exceptions,   it  would  be   expected  that  the  method  could  also 
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operate  in  noisy  environments. 

2. 3  Model-driven  Methods:   Buchanan  et »  al. ,  and  Michalski. 

Model-driven  methods  search  a  set  of  possible  generalizations  in  an  attempt 
to  find  a  few  "best"  hypotheses  which  satisfy  certain  requirements.  The 
two  methods  discussed  here  search  for  a  small  number  of  conjunctions  which 
together  cover  all  of  the  input  events.  The  search  proceeds  by  choosing  as 
the  initial  working  hypothesis  some  starting  point  in  the  partially  ordered 
set  of  all  possible  descriptions.  If  the  working  hypotheses  satisfy  cer- 
tain termination  criteria,  then  the  search  halts.  Otherwise,  the  current 
hypotheses  are  modified  by  slightly  generalizing  or  specializing  them. 
These  new  hypotheses  are  then  checked  to  see  if  they  satisfy  the  termina- 
tion criteria.  The  process  of  modifying  and  checking  continues  until  the 
criteria  are  met.  Top-down  techniques  typically  have  better  noise  immunity 
and  can  easily  be  extended  to  discover  disjunctions.  The  principal  disad- 
vantage of  these  techniques  is  that  the  working  hypotheses  must  repeatedly 
be  checked  to  determine  whether  they  subsume  all  of  the  input  events. 

2.3.1   Buchanan,  et.  al.:  Program  Meta-DENDRAL  [1-3,19] 

The  algorithm  which  we  describe  here  is  taken  from  the  RULEGEN  program 
(part  of  the  Meta-DE^DRAL  system) .  Meta-DENPRAL  was  designed  to  discover 
cleavage  rules  to  explain  mass  spectrometry  data.  The  descriptive  language 
is  based  on  the  ball-and-stlck  model  of  chemical  molecules.  Each  input 
event  is  a  bond  environment  which  describes  some  portion  of  a  molecule. 
The  environment  is  represented  by  a  graph  of  the  atoms  in  the  molecule  with 
four  descriptors  attached  to  each  atom  and  forms  the  left  hand   side   of   a 
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cleavage  rule.  The  right  hand  side  of  the  rule  predicts  a  cleavage  based 
on  the  existence  In  a  molecule  of  the  left-hand  side  of  the  rule  (breakbond 
(**)  indicates  that  the  **  bond  is  predicted  to  be  broken).  A  typical 
cleavage  rule  (with  atoms  w,  x,  y,  and  z)  is: 

LEFT-HAND  SIDE  (BOND  ENVIRONMENT) : 


Kolecule 

graph: 

w  **  x  —  y  - 

-  z  — 

Atom  descriptors: 

atom 

type 

nhs 

nbrs 

dots 

w 

carbon 

3 

1 

0 

X 

carbon 

2 

2 

0 

y 

nitrogen 

1 

2 

0 

z 

carbon 

2 

2 

0 

RIGHT-HAND  SIDE  (CLEAVAGE  PREDICTION): 
=>  Breakbond  (**) 
The  algorithm  chooses  as  its  starting  point  the  most  general  bond   enviro- 
ment   (  x  **  y  )  with  no  properties  specified  for  either  atom.   During  the 
search,  this  description  is  grown  by  successively  specializing  a   property 
of  one  of  the  atoms  in  the  graph  or  by  adding  a  new  atom  to  the  graph. 
After  each  specialization,  the  new  graph   is   checked   to  see   if   It   is 
"better"  than  the  parent  graph  from  which  is  was  derived.   A  daughter  graph 
is  better  than  its  parent  if  it  still  covers  at  least  half  of   the   input 
rules  (it's  general  enough)  and  still  focusses  on  only  one  cleavage  proces 
(it's  specific  enough).   The  cleavage  rules  built  by  this  algorithm   ".re 
further  improved  by  the  program  RULEMOD. 

Evaluation: 

i)  Representational  adequacy.  The  representation  was  adequate  for  the 
specific  task  of  developing  cleavage  rules.  It  was  not  intended  to  be  a 
gene^  1  representation  for  objects  outside  of   the  chemical  world.    The 
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descriptions  can  be  viewed  as  conjunctions.   Individual  rules  developed  by 
tbe  program  can  be  considered  to  be  linked  by  disjunction. 

ii)  Pules  of  generalization.  The  dropping  condition  and  turning  con- 
stants to  variables  rules  are  used  "in  reverse"  during  the  specialization 
process.  RULEGEN  does  not  seem  to  have  the  ability  to  handle  an  internal 
disjunction  but  RULFMOD  apparently  does.  For  example,  it  can  indicate  that 
the  type  of  atom  is  "anything  except  hydrogen".  In  similar  work  on  nuclear 
magnetic  resonance  (NMP),  Mitchell  presents  an  example  in  which  the  value 
of  nhs  is  listed  as  "greater  than  or  equal  to  one"  (which  indicates  an 
internal  disjunction). 

iii)  Computational  efficiency.  Because  this  is  a  problem-specific  al- 
gorithm, we  cannot  supply  comparison  figures  here  for  how  this  algorithm 
would  work  on  our  test  example.  The  current  program  is  considered  to  be 
relatively  inefficient  [2]. 

iv)  Flexibility  and  extensibility.  Meta-DENDRAL  has  been  extended  to 
handle  NMR  spectra.  The  program  works  well  in  an  errorful  environment. 
It  uses  domain-specific  knowledge  extensively.  However,  there  is  no  strict 
separation  between  a  general-purpose  induction  component  and  a  special- 
purpose  knowledge  component.  It  is  not  clear  whether  the  methods  developed 
for  Meta-PENDFAL  could  be  easily  applied  to  any  non-chemical  domain.  The 
program  does  not  perform  constructive  induction  In  any  general  way.  Howev- 
er, the  INTSTIM  program  does  perform  sophisticated  transformations  on  the 
Input  spectra  In  order  to  develop  the  bond-environment  descriptions. 

2.3.2  Michalskl  and  Pietterich:  Program  INDUCE  1.2 
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The  algorithm  described  here  is  one  of  three  algorithms  designed  by  Michal- 
ski  and  his  collaborators.  The  others  are  a  data-driven  method  described 
by  Stepp  [20]  and  a  mixed  method  described  by  Larson  and  Michalski  [12,13]. 
The  language  used  to  describe  the  input  events  is  VL_.,  an  extension  to 
first-order  predicate  logic  (FOPL)  [16].  Each  event  is  represented  as  a 
conjunction  of  selectors.  A  selector  typically  contains  a  function  or 
predicate  descriptor  (with  variables  as  arguments)  and  a  list  of  values 
that  the  descriptor  may  assume.  The  selector  [size(xl)"small,  medium]  as- 
serts that  the  size  of  xl  may  take  the  values  small  or  medium.  The  events 
in  Fig.  2  are  represented  as: 

El:  [size(xl)=small]  [size(x2)"small] 

[shape(xl) -circle] [ shape (x2) "square] 

[ontop(xl ,x2) ] 
E2:  [size(xl)"small]  [size(x2)=large] 

[size(x3)=small] [shape (xl) ^circle] 

[shape (x2) =square] [ shape (x3) "circle] 

[ontop(xl,x2)J [inside(x3,x2)] 

In  this  method,  descriptors  are  divided  into  two  classes:  attribute 
descriptors  and  structure-specifying  descriptors.  Attribute  descriptors 
describe  attributes  such  as  size  or  shape  or  distance  which  are  applicable 
to  all  variables  (representing,  e.g.,  object  parts).  Structure-specifying 
descriptors  include  all  other  descriptors.  They  typically  represent  rela- 
tionships among  variables  such  as  ontop  or  inside.  Each  input  conjunction 
is  broken  into  two  conjuncts — one  built  of  selectors  containing  only  attri- 
bute descriptors  (the  attribute  conjunct)  and  one  built  of  selectors  con- 
taining only  structure-specifying  descriptors  (the  structure  conjunct) . 

The  algorithm  is  based  on  the  observation  that  the  structure-specifying 
descriptors  are  responsible  for  the  computational  complexity  of  generaliz- 
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ing  structural  descriptions.  If  we  could  determine  conjunctions  of 
structure-specifying  selectors  which  were  relevant  for  describing  a  partic- 
ular class  of  objects,  then  the  generalization  of  the  attribute  conjuncts 
could  be  handled  quickly  by  an  appropriate  covering  algorithm.  The  algo- 
rithm seeks  to  determine  such  a  set  of  structure  conjuncts  which  appear 
likely  to  be  part  of  a  maximally  specific  conjunctive  generalization  of  all 
of  the  input  events.  It  does  this  by  finding  conjunctions  which  are  maxi- 
mally specific  generalizations  of  the  input  structure  conjuncts  considered 
alone.  Such  conjunctive  generalizations  of  the  structure  conjuncts  must  be 
contained  in  some  maximally  specific  generalizations  of  the  entire  set  of 
Input  events.  However,  there  may  be  maximally  specific  conjunctive  gen- 
eralizations of  the  input  events  which  contain  few  if  any  structure- 
specifying  selectors.  This  algorithm  also  finds  these  generalizations  by 
considering  structure  conjuncts  which  are  less  than  maximally  specific. 

The  algorithm  operates  In  two  phases.  The  first  phase  is  the  structure- 
determining  phase.  A  random  sample  of  the  input  structure  conjuncts  is 
taken.  This  sample  becomes  the  initial  set  of  generalizations  G_.  In  each 
step,  G  is  first  pruned  to  a  fixed  size  by  removing  unpromising  generali- 
zations. Then  G  is  checked  to  see  if  any  of  its  generalizations  covers 
all  of  the  structure  conjuncts.  If  any  do,  they  are  removed  from  G  and 
placed  in  the  set  C  of  candidate  conjunctive  generalizations.  Lastly,  G 
is  generalized  to  form  G  by  taking  each  element  of  G  and  generalizing 
it  In  all  possible  ways  by  dropping  single  selectors.  When  the  set  of  can- 
didates G  reaches  a  prespecified  size,  the  search  stops. 

The  second  phase  Is  the  attribute-determining  phase.   In   this  phase,   the 
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problem  Is  converted  to  a  multiple-valued  logic  covering  problem  using  the 
VL.  propositional  calculus  [14,15].  Each  candidate  cover  A  in  C  is  matched 
against  all  input  events  and  the  relevant  variables  are  identified.  For 
each  natch,  the  appropriate  attribute  conjuncts  are  extracted  and  used  to 
form  a  VL.  event.   For  example, 

if   A  *  [ontop(pl,p2)]  and 

Fl  =  [ontop(pl,p2)] [ontop(p2,p3)] 

[size(pl)«=l]  [size(P2)«3]  [size(p3)-5] 

[color (pi) =red] [color(p2)=green] 

[color(p3)*blue] 

then  we  get  two  VL.  events: 

Vl=  (1,  3,  red,  green)   and 
V2=  (3,  5,  green,  blue). 

These  are  vectors  of  attributes  which  correspond  here  to  the  descriptors: 

(size(pl),  size(p2),  color(pl),  color(p2)) 

for  pi  and  p2  in  A. 


All  input  events  are  converted  into  VL.  events  in  this  manner.  In  general, 
more  than  one  VL.  event  is  created  from  each  input  event.  The  set  of  VL. 
events  can  be  covered  using  a  covering  algorithm.  A  cover  could  be  ob- 
tained by  forming  the  union  of  the  values  taken  on  by  each  VL  attribute. 
Such  an  approach  usually  leads  to  overgeneralization  since  only  one  VL 
event  derived  from  each  input  event  need  be  covered.  We  use  a  beam-search 
technique  to  select  a  subset  of  the  VL.  events  to  be  covered. 

This  two-phase  algorithm  provides  two  computational  advantages.  First,   the 
time   required  to  compare  expressions  in  the  structure-determining  phase  is 
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reduced  because  the  structure  conjuncts  are  usually  much  smaller  than  the 
full  Input  conjuncts.  Second,  the  manipulation  of  VL.  formulas  Is  very 
easy  since  they  may  be  represented  as  bit  strings  and  manipulated  using 
fast  bit-parallel  operations.  The  chief  disadvantage  of  this  algorithm  is 
that  it  is  difficult  to  decide  when  to  terminate  the  structure-determining 
phase. 

Fvaluation: 

i)   Representational  adequacy.   The  algorithm  discovers,  among  others, 
the  following  generalizations  of  the  events  in  Fig.  1: 

1.  [ontop(pl,p2)  ]  [size(pl)=medium]  [shape(p l)«=circle, square, rectangle] 
[size(p2)=large]  f shape (p 2 )=box, rectangle, ellipse]  [texture (p 2 )=c lea r] 
There  is  a  medium-sized  circle,  rectangle  or  square  on  top  of  a 
large,  clear  box,  rectangle,  or  ellipse. 

2.  [ontop(pl,p2) ] [size(pl)=medium] [shape(pl)=polygon] 
[texture (pi )=clear] [size(p2)=medium,large] 

[  shape (p 2 )=rec tangle, circle] 

There  is  a  clear,  medium-sized  polygon  on  top  of  a  medium  or   large 

circle  or  rectangle. 

3.  [ontop(pl,p2)]  [size(pl)«=medium]  [shape (p 1 )*polygon] 
[size(p 2) -medium, large] [shape (p 2 )=rec tangle ,ellipse .circle] 

There  is  a  medium-sized  polygon  on  top  of  a  large  or  medium   rectan- 
gle, ellipse  or  circle. 

U.     [size(p l)=smal 1, medium]  [shape(p 1 )-circle .rectangle] 
[tpxture(pl ) -shaded] 
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There  is  a  shaded  object  which  is  either  medium  or  small  In  size  and 
has  a  circular  or  rectangular  shape. 

This  algorithm  implements  the  conjunction,  disjunction  and  internal  dis- 
junction operators.  It  provides  a  fairly  non-uniform  set  of  representa- 
tional facilities.  Descriptors,  variables,  and  values  are  all  dis- 
tinguished. Descriptors  are  further  analyzed  into  structure-specifying 
descriptors  and  attribute  descriptors.  The  current  method  provides  for 
descriptors  which  have  unordered,  linearly  ordered,  and  tree  ordered  value 
sets.  This  variety  of  possible  representations  permits  a  better  "fit" 
between  the  description  language  and  any  specific  problem. 

ii)  Rules  of  generalization.  The  algorithm  uses  all  rules  mentioned 
in  section  1.4  and  also  a  few  constructive  induction  rules  (see  below). 
All  constants  are  coded  as  variables.  The  effect  of  the  turning-constants 
to  variables  rule  is  achieved  as  a  special  case  of  the  generalization  by 
internal  disjunction  rule. 

iii)  Computational  efficiency.  The  algorithm  requires  28  comparisons 
and  builds  13  rules  during  the  search  to  develop  the  descriptions  listed 
above.  Four  rules  are  retained  so  this  gives  an  efficiency  ratio  of  4/1 3 
or  30*. 

iv)  Flexibility  and  extensibility.  The  algorithm  can  easily  discover 
disjunctions  by  altering  the  termination  criteria  for  the  structure- 
determining  phase  to  accept  structure  conjuncts  which  do  not  necessarily 
cover  all  of  the  input  events.  The  same  general  two-phase  approach  can 
also  be  applied  to  problems  of  determining  discriminant   generalizations. 
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Larson  and  Michalski  have  done  work  on  determining  discriminant  classifica- 
tion rules  [12,13,14] . 

The  algorithm  has  good  noise  immunity.  Noise  events  can  be  discovered  be- 
cause the  algorithm  tends  to  place  them  in  separate  terms  of  a  disjunction. 

Domain-specific  knowledge  can  be  incorporated  into  the  program  by  defining 
the  domains  of  descriptors,  specifying  the  structures  of  these  domains, 
specifying  certain  simple  production  rules,  and  by  providing  constructive 
induction  rules.  These  forms  of  knowledge  representation  are  not  always 
convenient,  however.  Further  work  should  provide  other  facilities  for 
knowledge  representation. 

A  few  simple  constructive  induction  rules  have  been  incorporated  into  the 
current  implementation  as  a  preprocessor.  Other  constructive  induction 
rules  can  be  specified  by  the  user.  Using  the  built-in  constructive  induc- 
tion rules,  the  program  produces  the  following  conjunctive  generalization 
of  the  input  events  in  Fig.  1: 

[ft   p's  with  texture  clear=2] [top-most(pl) ] 

[ontop(pl ,p2) ] [size(p l)=medium] 

[shape (pl)=polygon] [texture (pi )=clear] 

[size(p2)=medlum,large] 

[shape (p 2) =clrcle .rectangle] 

There  are  exactly  two  clear  objects  In  each  event.   The  top  most   object 

is  a   medium  sized,  clear  polygon  and  it  is  on  top  of  a  large  or  medium 

sized  circle  or  rectangle. 

hope  to  expand  this  constructive  induction  facility  in  the  future. 
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2.4  Summary 

The  comparison  of  various  methods  is  summarized  in  Fig.  3.  The  table  shows 
the  distinct  advantages  and  disadvantages  of  top-down  methods  as  opposed  to 
bottom-up  methods.  Bottom-up  methods  tend  to  be  faster  but  noise  immunity 
and  flexibility  suffer  as  a  consequence.  Top-down  methods  have  good  noise 
immunity  and  are  easily  modified  to  discover  disjunctive  and  other  forms  of 
generalization.  They  do  tend  to  be  computationally  more  expensive.  By 
separating  the  structure-determining  phase  from  the  attribute-determining 
phase  in  our  method,  a  considerable  speed-up  has  been  achieved. 

3.0  CONCLUSION 

One  of  the  problems  of  current  research  on  induction  is  that  each   research 
group  is  using  a  different  formal  language  and  terminology.   This  makes  the 
exchange  of  information  difficult.   This  paper  was  intended  to  help  readers  to 
get  a  better  understanding  of  the  state  of  the  art  in  this  area. 

Some  important  problems  to  be  addressed  in  future  research  include: 

i)  the  development  of  adequate  formal  languages  and  knowledge  represen- 
tations for  hypothesis  formulation  and  modification: 

ii)   extension  of  the  scopes  of  operators  and  forms  which  an  inductive 
program  can  efficiently  use  during  hypothesis  formulation; 

iii)  the  development  of  general  mechanisms  of  induction  which   can  be 
guided  by  problem-specific  packets  of  knowledge;  and 

iv)  incorporation  in  the  program  of  extensive  facilities  for  construc- 
tive induction  and  multi-level  schemes  of  description.   In  particular,  an 
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inductive  program  should  be  able  to  assign  names  to  various  subdescriptions 
and  use  these  names  in  the  formulation  of  hypotheses  (i.e.  generate 
hierarchical  forms). 

Finally,  an  important  principle  which  should  guide  future  research  is  what 
we  call  the  principle  of  comprehensibility.  This  principle  states  that  the 
descriptions  which  an  AI  program  uses  and  the  concepts  which  it  generates 
should  be  easily  comprehensible  by  people.  Tn  the  context  of  work  on  in- 
duction, the  comprehensibility  principle  requires  that  the  descriptions  be 
short  and  use  operators  which  can  be  easily  interpreted  in  natural 
language.  Furthermore,  systems  should  be  designed  to  provide  flexible  in- 
teractive facilities.  This  approach  has  been  adopted  in  our  work  because 
we  expect  that  the  most  significant  applications  of  AT  inductive  programs 
will  be  as  interactive  tools  for  conceptual  data  analysis  and  computer- 
aided  acquisition  of  rules  for  knowledge-based  expert  systems. 
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